Evolution in DDD use of Diabetes Drugs over time

code
analysis
exploration
cost
drugs
Author

Henrik Vitus Bering Laursen

Published

September 30, 2024

Drug utilisation over time

My two previous posts examined data available from medstat and were focused on those datasets and specifically the turnover of specific drug classes.

Here, I wish to demonstrate a different type of publicly available data, still related to prescription data.

This is data regarding drug prices, and can be found on The Danish Health Data Authority’s website here.

Danish drug prices are by and large renegotiated every 14 days. Therefore, the dataset is updated accordingly. I will use the data from the most recent update as of the start of making this post.

Looking at it post hoc, I used the following packages:

library(tidyverse)
library(readxl)
library(httr)
library(stringr)
library(extrafont)
# install.packages("ggpubr")
# library(gridExtra)
# library(ggpubr)

Fetching the data

I had some initial trouble figuring out how to download the file. What kept me from figuring it out was not realising that the file from the link was a zip file containing an excel file.

Apparently, “The extension .ashx doesn’t specify the file format but rather that a server-side handler is being used to serve the content.”, as chatGPT puts it. I have no gained the understanding that .ashx is just some sort of framework that serves up the data, not a file type.

This leads to the following code necessary to download the data. Code is just for show.

# URL of the .ashx file 
url <- "https://www.esundhed.dk/-/media/Files/Publikationer/Emner/Laegemidler/Medicinpriser/2024/lmpriser_eSundhed_240916.ashx"

# Define a path in my local datafolder, used as described in the first post about medstat data
datapath <- "C:/Users/henri/Documents/data-publicdataprojects"

# Define the path where the ZIP file will be saved
zip_destfile <- paste0(datapath,"lmpriser_eSundhed_240916.zip")

# Download the ZIP file
download.file(url, zip_destfile, mode = "wb")

# Unzip the file
unzip(zip_destfile, exdir = paste0(datapath,"unzipped_files"))

# Check the unzipped files
list.files(paste0(datapath,"unzipped_files"))

# check which sheet to import from the excel file
sheet_names <- excel_sheets(paste0(datapath,"unzipped_files/lmpriser_eSundhed_240916.xlsx"))

And then I import the file:

# Read the excel file
datapath <- "C:/Users/henri/Documents/data-publicdataprojects"
df_ddd <- read_excel(paste0(datapath,"unzipped_files/lmpriser_eSundhed_240916.xlsx"), sheet = "lmpriser_eSundhed_240916")

head(df_ddd)
# A tibble: 6 × 144
  ATC    Lægemiddel Indholdsstof Varenummer Pakning Styrke Form  Firma Indikator
  <chr>  <chr>      <chr>        <chr>      <chr>   <chr>  <chr> <chr> <chr>    
1 A01AA… Duraphat   Natriumfluo… 066411     51 g    5 mg/g tand… Colg… AIP      
2 A01AA… Duraphat   Natriumfluo… 066411     51 g    5 mg/g tand… Colg… AUP      
3 A01AA… Duraphat   Natriumfluo… 066411     51 g    5 mg/g tand… Colg… DDD      
4 A01AA… Duraphat   Natriumfluo… 066411     51 g    5 mg/g tand… Colg… AUP_pr_D…
5 A01AA… Duraphat   Natriumfluo… 066422     3 x 51… 5 mg/g tand… Colg… AIP      
6 A01AA… Duraphat   Natriumfluo… 066422     3 x 51… 5 mg/g tand… Colg… AUP      
# ℹ 135 more variables: `20190729` <dbl>, `20190812` <dbl>, `20190826` <dbl>,
#   `20190909` <dbl>, `20190923` <dbl>, `20191007` <dbl>, `20191021` <dbl>,
#   `20191104` <dbl>, `20191118` <dbl>, `20191202` <dbl>, `20191216` <dbl>,
#   `20191230` <dbl>, `20200113` <dbl>, `20200127` <dbl>, `20200210` <dbl>,
#   `20200224` <dbl>, `20200309` <dbl>, `20200323` <dbl>, `20200406` <dbl>,
#   `20200420` <dbl>, `20200504` <dbl>, `20200518` <dbl>, `20200601` <dbl>,
#   `20200615` <dbl>, `20200629` <dbl>, `20200713` <dbl>, `20200727` <dbl>, …

Cleaning the data

Just as the post, I want to focus on drugs used for diabetes. So I keep only the observations with A10A and A10B:

# Cleaning it in one set of operations
df_ddd_A10 <- df_ddd |> 
# Filter for just A10A and A10B, and keep just prp per ddd 
  filter(
    str_detect(ATC,("^A10B")) | str_detect(ATC,("^A10A")), 
    Indikator == "AUP_pr_DDD") |> 
  # Select variables
  select(ATC, Indholdsstof, Lægemiddel, starts_with("20")) |>
  # Pivot the data so that the variable columns that contain time are contained in one variable column
  pivot_longer(
    cols = starts_with("20"),
    names_to = "Tid",
    values_to = "prpddd") |> 
  mutate(
    Tid = ymd(Tid)) |>
  # group_by(tid) |>  DELETE??
  # mutate(
  #   hip_ddd = max(prpddd), # hip_ddd = maxpris per ddd
  #   lop_ddd = min(prpddd) # lop_ddd = minpris per ddd
  # ) |> 
  # ungroup() |> 
  filter(
    !is.na(prpddd)
  )

# And translate the colnames to english for good measure
head(df_ddd_A10)
# A tibble: 6 × 5
  ATC     Indholdsstof    Lægemiddel Tid        prpddd
  <chr>   <chr>           <chr>      <date>      <dbl>
1 A10AB01 Insulin (human) Actrapid   2019-07-29   6.48
2 A10AB01 Insulin (human) Actrapid   2019-08-12   6.48
3 A10AB01 Insulin (human) Actrapid   2019-08-26   6.48
4 A10AB01 Insulin (human) Actrapid   2019-09-09   6.48
5 A10AB01 Insulin (human) Actrapid   2019-09-23   6.48
6 A10AB01 Insulin (human) Actrapid   2019-10-07   6.48
colnames(df_ddd_A10) <- c(
  "atc",
  "compound",
  "product",
  "time",
  "prpddd" # Pharmacy Retail Price DDD
)
head(df_ddd_A10)
# A tibble: 6 × 5
  atc     compound        product  time       prpddd
  <chr>   <chr>           <chr>    <date>      <dbl>
1 A10AB01 Insulin (human) Actrapid 2019-07-29   6.48
2 A10AB01 Insulin (human) Actrapid 2019-08-12   6.48
3 A10AB01 Insulin (human) Actrapid 2019-08-26   6.48
4 A10AB01 Insulin (human) Actrapid 2019-09-09   6.48
5 A10AB01 Insulin (human) Actrapid 2019-09-23   6.48
6 A10AB01 Insulin (human) Actrapid 2019-10-07   6.48

Prepping data for plotting

The resulting dataset is one with all registered Pharmacy Retail Price per Defined Daily Dose (PRP per DDD) of Drugs in the A10 ATC category, from between 2019-07-29 and 2024-09-16. In other words: from the past 5.1362081 years.

This is basically a measure of how much it costs to treat a person with the medication with a standard dose, each day, and can be used to compare how much each drug costs to use for treatment. This is used because it can be difficult to compare drugs based on redeemed prescriptions or prices, as some drugs are prescribed differently than others. Also, some drugs have absurdly high prices per pill, as some specialised drugs in oncology and opthalmology, compared to very cheap generic pain killers.

This is exemplified in the code below.

df_ddd_test <- df_ddd |> 
  filter(Indikator == "AUP") |> # AUP = Pharmacy Retail Price
  pivot_longer(
    cols = starts_with("20"),
    names_to = "Tid",
    values_to = "prp"
    ) |> 
    summarise(
    highest_prp = max(prp, na.rm = TRUE),
    lowest_prp = min(prp, na.rm = TRUE)
  )
df_ddd_test
# A tibble: 1 × 2
  highest_prp lowest_prp
        <dbl>      <dbl>
1    9999990.       7.25

I want to present the numbers with inline code, and this can be done easily with the basic R function format() and cat() which can format the numeric value and concatenate the text.

# Format the numbers in base R
formatted_highest_prp <- format(df_ddd_test$highest_prp, big.mark = ",", scientific = FALSE)
formatted_lowest_prp <- format(df_ddd_test$lowest_prp, big.mark = ",", scientific = FALSE)

# Print the formatted numbers
cat("Highest AUP:", formatted_highest_prp, "DKK\n")
Highest AUP: 9,999,990 DKK
cat("Lowest AUP:", formatted_lowest_prp, "DKK\n")
Lowest AUP: 7.25 DKK

Below, the numbers are generated with inline code:

The highest Pharmacy Retail Price is NULL, and the lowest is NULL, which is QUITE a difference.

Making a plot

Now, back to the retail price per DDD of diabetes drugs. Just as a visual aid, lets compare the most expensive to the cheapest, of the diabetes drugs.

But before that, a useful thing I picked up from Meghan Hall’s blog, is putting a consistent theme on your plots. This can be done as below.

library(gghighlight)
library(scales)

Attaching package: 'scales'
The following object is masked from 'package:purrr':

    discard
The following object is masked from 'package:readr':

    col_factor
library(ggtext)
library(ggrepel)

hen_theme <- function () { 
  theme_linedraw(base_size=11) %+replace%  
    theme(
      panel.background  = element_blank(),
      plot.background = element_rect(fill = "white", color = NA), 
      legend.background = element_rect(fill = "white", color = NA),
      legend.key = element_rect(fill = "white", color = NA),
      axis.ticks = element_blank(),
      panel.grid.major = element_line(color = "grey90", size = 0.3), 
      panel.grid.minor = element_blank(),
      plot.title.position = "plot",
      plot.title = element_text(size = 16, hjust = 0, vjust = 0.5, 
                                margin = margin(b = 0.2, unit = "cm")),
      plot.subtitle = element_text(size = 10, hjust = 0, vjust = 0.5, 
                                   margin = margin(b = 0.4, unit = "cm")),
      plot.caption = element_text(size = 7, hjust = 1, face = "italic", 
                                  margin = margin(t = 0.1, unit = "cm")),
      axis.text.x = element_text(size = 13),
      axis.text.y = element_text(size = 13)
    )
}

The “hen_theme” is then added to future plots, at the end.

# find most expensive 
df_ddd_A10 |> 
  group_by(atc) |> 
  summarise(
    highest_prpddd = max(prpddd, na.rm = TRUE),
    lowest_prpddd = min(prpddd, na.rm = TRUE)
  ) |>   
  summarise(
    most_expensive_atc = atc[which.max(highest_prpddd)],
    most_expensive_value = max(highest_prpddd),
    least_expensive_atc = atc[which.min(lowest_prpddd)],
    least_expensive_value = min(lowest_prpddd)
  )
# A tibble: 1 × 4
  most_expensive_atc most_expensive_value least_expensive_atc
  <chr>                             <dbl> <chr>              
1 A10BX16                            522. A10BB12            
# ℹ 1 more variable: least_expensive_value <dbl>
# plot
hilo_ddd <- df_ddd_A10 |> 
  filter(atc == "A10BX16" | atc == "A10BB12") |> 
  select(compound,time,prpddd)
  
ggplot(hilo_ddd, mapping = aes(x=time,y=prpddd, colour = compound)) +
  geom_point() +
  labs(
    title = "Cheapest and most expensive A10 drugs",
    subtitle = "by Pharmacy Retail Price per DDD",
    y = "Pharmacy Retail Price per DDD",
    x = "",
  ) +
  hen_theme() + 
  theme(
    legend.title = element_blank(),
    legend.position = c(0.8,0.9),
    legend.background = element_rect(fill= "white")
    )
Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.
Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2
3.5.0.
ℹ Please use the `legend.position.inside` argument of `theme()` instead.

This plot tells us that the two drug compounds with the highest and lowest PRP per DDD are Glimepiride, of the sulfonyurea class, and Tirzepatide, a GIP and GLP1 combination.

Now, I wish to visualise how PRP per DDD changes over time. Below, I create the variables for the minimum and maximum values for each compound.

# Make a new var thats floored to months and find the min for each month
df_ddd_A10_minmax <- df_ddd_A10 |> 
  mutate(month = floor_date(time,"month")) |> 
  group_by(month, atc, compound) |> 
  summarise(
    a_min_prpddd = min(prpddd, na.rm = TRUE), # "a_" is to make it first in the facetwrap later 
    b_max_prpddd = max(prpddd, na.rm = TRUE)) |> 
  ungroup()
`summarise()` has grouped output by 'month', 'atc'. You can override using the
`.groups` argument.
head(df_ddd_A10_minmax)
# A tibble: 6 × 5
  month      atc     compound         a_min_prpddd b_max_prpddd
  <date>     <chr>   <chr>                   <dbl>        <dbl>
1 2019-07-01 A10AB01 Insulin (human)          6.07         9.90
2 2019-07-01 A10AB04 Insulin lispro           9.8         12.2 
3 2019-07-01 A10AB05 Insulin aspart           9.66        13.1 
4 2019-07-01 A10AB06 Insulin glulisin         9.35         9.55
5 2019-07-01 A10AC01 Insulin (human)          6.07        10.9 
6 2019-07-01 A10AD01 Insulin (human)          6.34        10.9 

And now for the grand plot I have planned for this post, which will be an overview of the PRP per DDD for the main drug classes in diabetes type 2 treatment.

First, I limit the dataset to only encompass the classes I want to look at.

dfforplot <- df_ddd_A10_minmax |> 
  filter(
    str_detect(atc,"^A10BA") |   # met
      str_detect(atc,"^A10BB") | # sul
      str_detect(atc,"^A10BK") | # sglt2
      str_detect(atc,"^A10BJ") | # glp1
      str_detect(atc,"^A10BH") | # dpp4
      str_detect(atc,"^A10BX16") # tirzepatid, honoured guest
    )  

Then, I try to make the plots which contain a surmountable amount of information.

# Minimum values plot
plotmin <- ggplot(dfforplot) +
  geom_line(aes(x = month, y = a_min_prpddd, colour = compound), linewidth = 0.8) +
  labs(title = "Minimum PRP per DDD Over Time",
       x = "",
       y = "PRP per DDD",
       colour = "Drug") +
  hen_theme()
# Max
plotmax <- ggplot(dfforplot) +
  geom_line(aes(x = month, y = b_max_prpddd, colour = compound), linewidth = 0.8) +
  labs(title = "Maximum PRP per DDD Over Time",
       x = "",
       y = "PRP per DDD",
       colour = "Drug") +
  hen_theme()

plotmin

plotmax

Now that is quite the jumble of lines.

I want to do the following:

  • Create a more meaningful colour representation
  • Look at the outliers separately from the ones staying below 50 and 100, respectively.

First, meaningful colour representation.

The exact amount of different compounds within each class can be found via the code below.

dfforplot |> group_by(atc) |> distinct(compound)
# A tibble: 20 × 2
# Groups:   atc [20]
   atc     compound     
   <chr>   <chr>        
 1 A10BA02 Metformin    
 2 A10BB01 Glibenclamid 
 3 A10BB07 Glipizid     
 4 A10BB09 Gliclazid    
 5 A10BB12 Glimepirid   
 6 A10BH01 Sitagliptin  
 7 A10BH02 Vildagliptin 
 8 A10BH03 Saxagliptin  
 9 A10BH04 Alogliptin   
10 A10BH05 Linagliptin  
11 A10BJ01 Exenatid     
12 A10BJ02 Liraglutid   
13 A10BJ03 Lixisenatid  
14 A10BJ05 Dulaglutid   
15 A10BJ06 Semaglutid   
16 A10BK01 Dapagliflozin
17 A10BK02 Canagliflozin
18 A10BK03 Empagliflozin
19 A10BK04 Ertugliflozin
20 A10BX16 Tirzepatid   

Now we can setup a vector with colour mapping using the output from above. Copy and paste it into a text editor to convert it to something useful. There are probably smarter ways to use that output.

# prep colour scheme for drugs

colour_mapping <- c(
  # Unique drug
  "Metformin"       = "#000000",  # Deep Blue (for uniqueness)

  # Sulfonylureas (reddish colors)
  "Glibenclamid"    = "#d62728",  # Red
  "Glipizid"        = "#e37777",  # Light Red
  "Gliclazid"       = "#c13515",  # Darker Red
  "Glimepirid"      = "#ff6347",  # Tomato Red

  # DPP-4 inhibitors (greenish colors)
  "Sitagliptin"     = "#2ca02c",  # Green
  "Vildagliptin"    = "#98df8a",  # Light Green
  "Saxagliptin"     = "#34a56f",  # Teal Green
  "Alogliptin"      = "#57a774",  # Medium Green
  "Linagliptin"     = "#1e7f5f",  # Dark Green

  # GLP-1 receptor agonists (bluish colors)
  "Exenatid"        = "#1f77b4",  # Deep Blue
  "Liraglutid"      = "#5b9bd5",  # Light Blue
  "Lixisenatid"     = "#6495ed",  # Cornflower Blue
  "Dulaglutid"      = "#4682b4",  # Steel Blue
  "Semaglutid"      = "#4169e1",  # Royal Blue

  # SGLT2 inhibitors (yellowish and brownish colors)
  "Dapagliflozin"   = "#ffd700",  # Gold
  "Canagliflozin"   = "#e5b33f",  # Dark Yellow
  "Empagliflozin"   = "#d9a120",  # Mustard
  "Ertugliflozin"   = "#b8860b",  # Dark Goldenrod

  # Tirzepatide (distinct color)
  "Tirzepatid"      = "#8b008b"   # Dark Magenta
)

Now to look at the ones who are not outliers separately to see if the graph is visually meaningful.

testplot1 <- ggplot(dfforplot |> filter(a_min_prpddd<100), aes(x = month, y = a_min_prpddd, colour = compound)) +
  geom_line() +
  scale_colour_manual(values = colour_mapping) +
  scale_x_continuous(breaks = seq.Date(as.Date("2020-01-01"), as.Date("2025-12-01"), by = "year"),
                     labels = format(seq.Date(as.Date("2020-01-01"), as.Date("2025-12-01"), by = "year"), "%Y")) +
  labs(title = "Minimum Pharmacy Retail Price per DDD, in DKK",
       x = "",
       y = "",
       colour = "Drug") +
  hen_theme() 

testplot1

First of all, this is a mess. In addition to looking at the outliers separately, I want to do the following:

  1. Sort the legend according to drug class (atc codes)
  2. Reduce the amount of compounds included

For (1), y’all need to hold on for dear life because this stuff is an amalgamation of what I have done in my PhD combined with some new things. I will have to explain each part of the code in great detail to even remember how this works.

The explanations are maybe mostly for me.

To achieve that, I will firstly make a variable with a label for each drug class. Then, I will order the compound variable by the class variable. Then I apply cus tom colours to the compound variable.

The first part is relatively simply done using case_when() and knowledge of what classes the ATC codes represent.

dfforplot <- dfforplot |> 
  mutate(class = case_when(
    substr(atc, 1, 5) == "A10BA" ~ "Metformin",
    substr(atc, 1, 5) == "A10BB" ~ "Sulfonylureas",
    substr(atc, 1, 5) == "A10BH" ~ "DPP4",
    substr(atc, 1, 5) == "A10BJ" ~ "GLP1",
    substr(atc, 1, 5) == "A10BK" ~ "SGLT2",
    substr(atc, 1, 5) == "A10BX" ~ "Tirzepatide",
    TRUE ~ "Other"  # For any other unclassified codes
  ))

For the second part, I set the order I want, turn the class variable into a factor, and then arrange() the compounds within each class according to this order. Finally, I turn the compound variable into a factor that is ordered by its unique levels, which correspond to the classes, and check that compound has the correct order.

class_order <- c("Metformin", "Sulfonylureas", "DPP4", "GLP1", "SGLT2", "Tirzepatide")
dfforplot <- dfforplot |>
  mutate(class = factor(class, levels = class_order))  |>
  arrange(class, compound) |>
  mutate(compound = factor(compound, levels = unique(compound)))
print(levels(dfforplot$compound))
 [1] "Metformin"     "Glibenclamid"  "Gliclazid"     "Glimepirid"   
 [5] "Glipizid"      "Alogliptin"    "Linagliptin"   "Saxagliptin"  
 [9] "Sitagliptin"   "Vildagliptin"  "Dulaglutid"    "Exenatid"     
[13] "Liraglutid"    "Lixisenatid"   "Semaglutid"    "Canagliflozin"
[17] "Dapagliflozin" "Empagliflozin" "Ertugliflozin" "Tirzepatid"   

The third part is simply remembering the order of classes, and how many compounds are within each class, and then creating a selection of colours via the colorRampPallette() function, and then store that in the object colours_compunds, to use in the plot.

met_colours  <- colorRampPalette(c("black"))(1)
su_colours  <- colorRampPalette(c("pink","darkred"))(4)
dpp4_colours   <- colorRampPalette(c("green","darkgreen"))(5) # NB THIS WORKS BECAUSE THE FIRST 5 ARE DPP4, 6 are GLP1, etc etc.
glp1_colours   <- colorRampPalette(c("lightblue","darkblue"))(5)
sglt2_colours  <- colorRampPalette(c("yellow","#b8860b"))(4)
tir_colours  <- colorRampPalette(c("#8b008b"))(1)
colours_compounds <- c(met_colours,
                su_colours,
                dpp4_colours,  
                glp1_colours,
                sglt2_colours,
                tir_colours) # combine
barplot(rep(1,20), col=colours_compounds, border = "white", axes = FALSE)

For (2), I will focus on the three most sold compounds in the latest year and remove the rest to avoid cluttering the plot with information.

I use Medstat as a source, and what I look for is DDD sold in total across the entire healthcare sector, which is incredible easy to extract from the site. The amounts are from when I viewed the site in October 2024. Below, I store a vector of the most three most used compounds (less if class has less compounds).

kept_compounds <- c(
  "Metformin", 
  "Glipizid",
  "Gliclazid",
  "Glimepirid",
  "Sitagliptin",
  "Vildagliptin",
  "Linagliptin",
  "Liraglutid",
  "Dulaglutid",
  "Semaglutid",
  "Dapagliflozin",
  "Canagliflozin",
  "Empagliflozin",
  "Tirzepatid"  
)

Now, lets make plots for the ones that are not outliers. In practice I will do that by just limiting the y axis.

dfforplot |> 
  filter(a_min_prpddd<100
         & compound %in% kept_compounds) |> 
  ggplot(aes(x = month, y = a_min_prpddd, colour = compound)) +
  geom_line() +
  scale_colour_manual(values = colours_compounds) +
  scale_x_continuous(breaks = seq.Date(as.Date("2020-01-01"), as.Date("2025-12-01"), by = "year"),
                     labels = format(seq.Date(as.Date("2020-01-01"), as.Date("2025-12-01"), by = "year"), "%Y")) +
  labs(title = "Minimum Pharmacy Retail Price per DDD, in DKK",
       x = "",
       y = "",
       colour = "Drug") +
  hen_theme() 

Oops, I messed up. The colours do not fit anymore. Will fit the colours to the new more limited amount of compounds.

met_colours  <- colorRampPalette(c("black"))(1)
su_colours  <- colorRampPalette(c("pink","darkred"))(3)
dpp4_colours   <- colorRampPalette(c("green","darkgreen"))(3) # NB THIS WORKS BECAUSE THE FIRST 5 ARE DPP4, 6 are GLP1, etc etc.
glp1_colours   <- colorRampPalette(c("lightblue","darkblue"))(3)
sglt2_colours  <- colorRampPalette(c("yellow","#b8860b"))(3)
tir_colours  <- colorRampPalette(c("#8b008b"))(1)
colours_compounds <- c(met_colours,
                su_colours,
                dpp4_colours,  
                glp1_colours,
                sglt2_colours,
                tir_colours) # combine
barplot(rep(1,15), col=colours_compounds, border = "white", axes = FALSE)

Now try again.

dfforplot |> 
  filter(a_min_prpddd<100
         & compound %in% kept_compounds) |> 
  ggplot(aes(x = month, y = a_min_prpddd, colour = compound)) +
  geom_line() +
  scale_colour_manual(values = colours_compounds) +
  scale_x_continuous(breaks = seq.Date(as.Date("2020-01-01"), as.Date("2025-12-01"), by = "year"),
                     labels = format(seq.Date(as.Date("2020-01-01"), as.Date("2025-12-01"), by = "year"), "%Y")) +
  labs(title = "Minimum Pharmacy Retail Price per DDD, in DKK",
       x = "",
       y = "",
       colour = "Drug") +
  hen_theme() 

Now that looks much better. Now let’s make the final graph, using facet_wrap() to distinguish between the minimum and maximum PRP per DDD.

To use facet_wrap(), we pivot the dataset before plotting.

dfforplot_pivot <- dfforplot |> 
  pivot_longer(
    cols = ends_with("prpddd"),
    names_to = "ddd",
    values_to = "minmax"
  ) |> arrange(minmax)

head(dfforplot_pivot)
# A tibble: 6 × 6
  month      atc     compound   class         ddd          minmax
  <date>     <chr>   <fct>      <fct>         <chr>         <dbl>
1 2019-09-01 A10BB12 Glimepirid Sulfonylureas a_min_prpddd  0.135
2 2019-08-01 A10BB12 Glimepirid Sulfonylureas a_min_prpddd  0.146
3 2020-01-01 A10BB12 Glimepirid Sulfonylureas a_min_prpddd  0.155
4 2020-02-01 A10BB12 Glimepirid Sulfonylureas a_min_prpddd  0.155
5 2019-11-01 A10BB12 Glimepirid Sulfonylureas a_min_prpddd  0.156
6 2019-12-01 A10BB12 Glimepirid Sulfonylureas a_min_prpddd  0.156

And then make the plot.

dfforplot_pivot |> 
  filter(compound %in% kept_compounds) |> 
  ggplot(aes(x = month, y = minmax, colour = compound)) +
  geom_line() +
  facet_wrap(
    ~ddd, 
    scales = "free",
    labeller = labeller(ddd =
                          c("a_min_prpddd" = "Minimum",
                            "b_max_prpddd" = "Maximum")
                        )) +
  scale_colour_manual(values = colours_compounds) +
  scale_x_continuous(breaks = seq.Date(as.Date("2020-01-01"), as.Date("2025-12-01"), by = "year"),
                     labels = format(seq.Date(as.Date("2020-01-01"), as.Date("2025-12-01"), by = "year"), "%Y")) +
  labs(title = "Minimum Pharmacy Retail Price per DDD, in DKK",
       x = "",
       y = "",
       colour = "Drug") +
  hen_theme() 

And the final, more presentable plot.

dfforplot_pivot |> 
  filter(minmax<100
         & compound %in% kept_compounds) |> 
  ggplot(aes(x = month, y = minmax, colour = compound)) +
  geom_line() +
  facet_wrap(
    ~ddd, 
    scales = "free",
    labeller = labeller(ddd =
                          c("a_min_prpddd" = "Minimum",
                            "b_max_prpddd" = "Maximum")
                        )) +
  scale_colour_manual(values = colours_compounds) +
  scale_x_continuous(breaks = seq.Date(as.Date("2020-01-01"), as.Date("2025-12-01"), by = "year"),
                     labels = format(seq.Date(as.Date("2020-01-01"), as.Date("2025-12-01"), by = "year"), "%Y")) +
  labs(
    title = "Pharmacy Retail Price per DDD, in DKK",
    caption = "Outliers: Minimum Tirzepatide prices go from",
    x = "",
    y = "",
    colour = "Drug") +
  hen_theme() 

I also want to add information about the outliers on the plot.

JEG HAR GJORT DET MESTE FÆRDIGT OG VIL BARE GERNE SKRIVE DET RENT SENERE

# Filter the dataset for Semaglutide and Tirzepatide
subset_df <- dfforplot_pivot %>%
  filter(compound %in% c("Semaglutid", "Tirzepatid"))

# Summarise the earliest, latest, highest, and lowest values for each compound
summary_df <- subset_df %>%
  group_by(compound) %>%
  summarise(
    earliest_date = min(month),
    latest_date = max(month),
    highest_ddd = max(minmax),
    lowest_ddd = min(minmax),
    .groups = 'drop'  # Ungroup after summarising
  )

# View the resulting dataset
print(summary_df)
# A tibble: 2 × 5
  compound   earliest_date latest_date highest_ddd lowest_ddd
  <fct>      <date>        <date>            <dbl>      <dbl>
1 Semaglutid 2019-07-01    2024-09-01         308.       22.6
2 Tirzepatid 2024-04-01    2024-09-01         522.      103. 
# Create a caption string
caption_text <- paste(
  "Outliers were Semaglutide and Tirzepatide. Lowest PRP per DDD for Semaglutide was",
  "22.6, which increased to 308 in 2024.",
  "For Tirzepatide, the lowest value was 103, and increased to 522 in September of 2024."
  )

# Print caption for verification
print(caption_text)
[1] "Outliers were Semaglutide and Tirzepatide. Lowest PRP per DDD for Semaglutide was 22.6, which increased to 308 in 2024. For Tirzepatide, the lowest value was 103, and increased to 522 in September of 2024."

Doublechecking with my older data.

df_ddd_A10 |> filter(str_detect(compound, "Tirzepatid")) |> summarise(haps = min(prpddd))
# A tibble: 1 × 1
   haps
  <dbl>
1  103.
df_ddd_A10 |> filter(str_detect(compound, "Tirzepatid")) |> summarise(haps = max(prpddd))
# A tibble: 1 × 1
   haps
  <dbl>
1  522.
df_ddd_A10 |> filter(str_detect(compound, "Semaglutid")) |> summarise(haps = min(prpddd))
# A tibble: 1 × 1
   haps
  <dbl>
1  22.6
df_ddd_A10 |> filter(str_detect(compound, "Semaglutid")) |> summarise(haps = max(prpddd))
# A tibble: 1 × 1
   haps
  <dbl>
1  308.

So that makes the final final plot:

dfforplot_pivot |> 
  filter(minmax<100
         & compound %in% kept_compounds) |> 
  ggplot(aes(x = month, y = minmax, colour = compound)) +
  geom_line() +
  facet_wrap(
    ~ddd, 
    scales = "free",
    labeller = labeller(ddd =
                          c("a_min_prpddd" = "Minimum",
                            "b_max_prpddd" = "Maximum")
                        )) +
  scale_colour_manual(values = colours_compounds) +
  scale_x_continuous(breaks = seq.Date(as.Date("2020-01-01"), as.Date("2025-12-01"), by = "year"),
                     labels = format(seq.Date(as.Date("2020-01-01"), as.Date("2025-12-01"), by = "year"), "%Y")) +
  labs(
    title = "Pharmacy Retail Price per DDD, in DKK",
    caption = caption_text,
    x = "",
    y = "",
    colour = "Drug") +
  hen_theme() 

ggsave("thumbnail.png", plot = last_plot(), width = 6, height = 4) # saved as thumbnail